Building a Large Scale Knowledge Base from Chinese Wiki Encyclopedia
نویسندگان
چکیده
DBpedia has been proved to be a successful structured knowledge base, and large scale Semantic Web data has been built by using DBpedia as the central interlinking-hubs of the Web of Data in English. But in Chinese, due to the heavily imbalance in size (no more than one tenth) between English and Chinese in Wikipedia, there are few Chinese linked data are published and linked to DBpedia, which hinders the structured knowledge sharing both within Chinese resources and cross-lingual resources. This paper aims at building large scale Chinese structured knowledge base from Hudong, which is one of the largest Chinese Wiki Encyclopedia websites. In this paper, an upper-level ontology schema in Chinese is first learned based on the category system and Infobox information in Hudong. Totally, there are 19542 concepts are inferred, which are organized in hierarchy with maximally 20 levels. 2381 properties with domain and range information are learned according to the attributes in the Hudong Infoboxes. Then, 802593 instances are extracted and described using the concepts and properties in the learned ontology. These extracted instances cover a wide range of things, including persons, organizations, places and so on. Among all the instances, 62679 of them are linked to identical instances in DBpedia. Moreover, the paper provides RDF dump or SPARQL to access the established Chinese knowledge base. The general upper-level ontology and wide coverage makes the knowledge base a valuable Chinese semantic resource. It not only can be used in Chinese linked data building, the fundamental work for building multi lingual knowledge base across heterogeneous resources of different languages, but also can largely facilitate many useful applications of large-scale knowledge base such as knowledge question-answering and semantic search.
منابع مشابه
A Novel Online Encyclopedia-Oriented Approach for Large-Scale Knowledge Base Construction
In the process of constructing large-scale knowledge base, manual-based construction approach lacks efficiency as well as flexibility. Therefore, automatically extracting of massive knowledge from online encyclopedia has attracted attention from an increasing number of scholars. Current research is mainly focused on the extracting of data from English online encyclopedia, whereas research about...
متن کاملWisdom of the Crowds: Decentralized Knowledge Construction in Wikipedia
Recently, Nature published an article comparing the quality of Wikipedia articles to those of Encyclopedia Britannica (Giles 2005). The article, which gained much public attention, provides evidence for Wikipedia quality, but does not provide an explanation of the underlying source of that quality. Wikipedia, and wikis in general, aggregate information from a large and diverse author-base, wher...
متن کاملA Proposal for a Gene Functions Wiki
Large knowledge bases integrating different domains can provide a foundation for new applications in biology such as data mining or automated reasoning. The traditional approach to the construction of such knowledge bases is manual and therefore extremely time consuming. The ubiquity of the internet now makes large-scale community collaboration for the construction of knowledge bases, such as t...
متن کاملUsing Knowledge Wikis to Support Scientific Communities
With the success of numerous applications of the Web 2.0 the interest in a web–based support of scientific communities has also gained significant relevance. The concept of Wikis, one building block of the Web 2.0, has shown to be a reasonable infrastructure for sharing and refining any kind of knowledge. The most prominent example is the encyclopedia Wikipedia, but many smaller wiki applicatio...
متن کاملCross-language and Cross-encyclopedia Article Linking Using Mixed-language Topic Model and Hypernym Translation
Creating cross-language article links among different online encyclopedias is now an important task in the unification of multilingual knowledge bases. In this paper, we propose a cross-language article linking method using a mixed-language topic model and hypernym translation features based on an SVM model to link English Wikipedia and Chinese Baidu Baike, the most widely used Wiki-like encycl...
متن کامل